Acquisition and Annotation of Slovenian Broadcast News Database
نویسندگان
چکیده
This paper presents the Slovenian Broadcast News Database project that was started in year 2002 as cooperation between University of Maribor and Slovenian national broadcaster RTV Slovenia. The resulting database will be used for large vocabulary continuous speech recognition and multimedia database retrieval or archive indexation. First some organizational aspects that were needed in initial phase of the project are described. The raw audio and video material was acquired from the original Analog Beta SP Master tapes that are preserved in the RTV Slovenia’s archive. Raw material was copied to DAT and DVD media. Also additional teletext material was collected. The manual annotation of speech material is performed with the Transcriber tool. The annotation rules were defined on the basis of general rules for Broadcast News databases, with some special language dependent sections. Also some statistics on a part of current material are given.
منابع مشابه
Development of Slovenian Broadcast News Speech Database
The paper reviews the development of a new Slovenian broadcast news speech database. The database consists of audio, video and annotation transcripts of about 34 hours of television daily news program captured from the public TV station RTVSLO. The paper addresses issues concerning transcription and annotation of the collected data, provides information on content analysis and basic statistics ...
متن کاملBNSI Slovenian broadcast news database - speech and text corpus
This paper presents the BNSI Slovenian Broadcast News database project. The result of the project is a database with speech and text corpus oriented toward large vocabulary continuous speech recognition in general domain. The speech corpus consists of 36 hours of transcribed evening and late night news. The raw database material was captured in the archive of national broadcaster RTV Slovenia t...
متن کاملAcquisition and Annotation of Slovenian Lombard Speech Database
This paper presents the acquisition and annotation of Slovenian Lombard Speech Database, the recording of which started in the year 2008. The database was recorded at the University of Maribor, Slovenia. The goal of this paper is to describe the hardware platform used for the acquisition of speech material, recording scenarios and tools used for the annotation of Slovenian Lombard Speech Databa...
متن کاملSINOD - Slovenian non-native speech database
This paper presents the SINOD database, which is the first Slovenian non-native speech database. It will be used to improve the performance of large vocabulary continuous speech recogniser for non-native speakers. The main quality impact is expected for acoustic models and recogniser’s vocabulary. The SINOD database is designed as supplement to the Slovenian BNSI Broadcast News database. The sa...
متن کاملThe COST278 Pan-European Broadcast News Database
This paper describes a pan-European multilingual audio and video database of broadcast news shows. The database was constructed by seven institutions that are collaborating in the European COST278 action on Spoken Language Interaction in Telecommunications. At present, the database comprises broadcast news shows in seven languages, namely Dutch, Portuguese, Galician, Czech, Slovenian, Slovakian...
متن کامل